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A solution to a 3-satisfiabiIity (3-SAT) formula can be expanded into a cluster, all other solutions 
of which are reachable from this one through a sequence of single-spin flips. Some variables in 
the solution cluster are frozen to the same spin values by one of two different mechanisms: frozen- 
core formation and long-range frustrations. While frozen cores are identified by a local whitening 
algorithm, long-range frustrations are very difficult to trace, and they make an entropic belief- 
propagation (BP) algorithm fail to converge. For BP to reach a fixed point the spin values of a 
tiny fraction of variables (chosen according to the whitening algorithm) are externally fixed during 
the iteration. From the calculated entropy values, we infer that, for a large random 3-SAT formula 
with constraint density close to the satisfiability threshold, the solutions obtained by the survey- 
propagation or the walksat algorithm belong neither to the most dominating clusters of the formula 
nor to the most abundant clusters. This work indicates that a single solution cluster of a random 
3-SAT formula may have further community structures. 
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I. INTRODUCTION 

The if-satisfiability (X-SAT) problem is a prototypical constraint satisfaction problem in the non-deterministic 
polynomial complete (NP-complete) complexity class [IJ. Statistical physicists became interested in this computer 
science problem since the discovery of phase-transition phenomena in the ensemble of random 3-SAT formulas in the 
early 1990s. Randomly generated 3-SAT formulas were found to be either almost always satisfiable or almost always 
unsatisfiable depending on whether or not the density of constraint a [defined by Eq. ^ below] is lower than a critical 
value as (the SAT-UNSAT transition point) [3,[1|. Furthermore, random 3-SAT formulas whose satisfiability being 
most difficult to resolve all have constraint densities close to the critical value ctg [!]■ A lot of theoretical work (see, 
e.g., Refs. 0, H, @, 0, Il)[l)[i3j[ill[i2,liljli3l) has been done to understand the satisfiability transition in the random 
if-S AT problem and the rapid increase of resolution time as the constraint density a approaches as ■ 

In the SAT phase with constraint density a close to as, the solution space of a typical large random X-SAT 
formula {K > 3) can be grouped into many clusters. The solution clusters are not homogeneous in size, some clusters 
may contain many more solutions than others. Therefore the solution clusters are characterized by a (continuous or 
discontinuous) spectrum of entropy densities I.'!. l4, 15]. On the other hand, it is not clear whether different solution 
clusters are separated by high energy barriers or they can be reached one from the other through paths of low-energy 
intermediate partial solutions. This is one of the major open questions concerning the organization of the solution 
space of a random if-SAT formulae. In this connection, it was recently realized that clustering of the solution s pac e 
in the random K-SAT problem does not pose real difficulty for heuristic local search algorithms [H, [13, [3 [l9l |20|] . 
Algorithms such as GSAT, walksat and ChainSAT [ll,[ll,|T^ appear to be capable of efficiently escaping from valleys 
in the energy landscape of a random if-SAT formula. These experimental experiences led to the conjecture that what 
really makes finding a satisfying solution hard is the presence of frozen variables (see, e.g., Ref. [2l| and more recent 
papers [3 [13, [13] )■ ^ frozen variable in a solution cluster is a variable which is the same literal in all the 
solutions of the cluster. If a finite fraction of variables are frozen in a given solution cluster, it was argued that it 
would be difficult for a local algorithm to assign values to all these variables, and that such solutions would be hard 
to find j25 , 26]. The freezing transition for the random K-SAT problem in principle can be estimated by the entropic 
cavity method of statistical mechanics [l^] but extensive mean-field population dynamics simulations [3, [3, [HI are 
needed. For the random 3-SAT problem the known quantitative estimation of the freezing transition point comes 
from a finite-size scaling analysis on exact enumeration results 23]. 

Earlier mean-field theoretical studies [H, [H, [13] on the if-SAT problem have focused on ensemble-averaged prop- 
erties. In this work, we take a complementary approach and investigate the properties of solution clusters that are 
associated with single reference solutions. This study has been driven by two main motivations: First we wish to 
know in more detail the 'local structures' of the solution space of the random 3-SAT problem, which might be in- 
visible in ensemble studies; and second we wish to know whether the solutions obtained by the survey-propagation 
and the walksat algorithms for a given random 3-SAT formula are contained in the dominating solution clusters of 
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this formula. Given an initial satisfying solution for a 3-SAT formula, we use the whitening algorithm of Parisi f28| 
(see also Rcfs. [l^ HI]) to determine which variables are frozen (i.e., taking the same spin value) in the associated 
solution cluster. A simple mean-field formula [Eq. ([5|)] is also given, which predicts with high precision the fraction 
of frozen variables in planted solutions for the type-B random 3-SAT formulas studied in this paper. We point out 
that, even if a reference solution is completely whitable by the whitening algorithm, some variables in the associated 
solution cluster may still be frozen. This is because variable freezing can be caused by another independent mecha- 
nism, namely long-range frustration among unfrozen variables as discussed in Refs. [30l [3ll [. When the neighboring 
unfrozen variables of a variable i are long-rangely frustrated, this variable very probably will be frozen. Two heuristic 
algorithms are constructed to identify variables that are frozen due to long-range frustrations. The entropy of the 
solution cluster associated with a given reference solution is calculated by the entropic cavity method, taking the 
reference solution as initial condition for the set of zero-energy belief-propagation (BP) iterative equations [Eqs. (O 
and (jlip] . The entropy values reported in this paper are consistent with mean- field results of Ref . . 

For large random 3-SAT formulas with constraint densities close to a^, we find that if solutions obtained by the 
survey-propagation (SP) [1, [1, [13], the walksat [13], or the belief-propagation-guided decimation [H, [13] algorithm 
are used as initial conditions, the entropic BP algorithm always fails to reach a fixed point. Besides the ensemble 
of completely random 3-SAT formulas, a set of large random 3-SAT formulas containing a pre-specified satisfying 
solution are also studied, and for each of them several additional solutions are obtained by the SP algorithm and the 
walksat algorithm. For a 3-SAT formula in this second ensemble, if the entropic BP iterative equations are run with 
the planted solution as the initial condition, a fixed point is quickly reached, but if a solution obtained by the SP or 
the walksat algorithm is used as the initial condition, the iterative equations again fails to converge. This observation 
suggests that planted solutions and solutions obtained by the SP algorithm are quite different. In cases when the BP 
algorithm fail to reach a fixed point, we fix the spin values of a tiny fraction of variables and then rc-run the BP 
iteration equations. The modified BP iteration process will converge if this set of externally fixed variables are chosen 
according to the outcome of the whitening program. 

In the remaining part of the paper we work exclusively on the random 3-SAT problem, but the illustrated approach 
should be directly applicable to more general cases. The following section list the ensembles of random 3-SAT 
formulas used in this work. In Sec. IIIII we investigate the whitening algorithm and present a mean-field formula to 
describe the freezing transition in a cluster of solutions. And the SpinFlip algorithm and another heuristic search 
algorithm are introduced to search for frozen variables in a completely white solution. The entropy of the solution 
cluster associated with a planted solution is calculated by the entropic BP algorithm in Sec. |TVl For solution clusters 
associated with single SP or walksat solutions, their entropy values and fraction of frozen variables are calculated in 
Sec. |V]by combining BP with the whitening program. We conclude this work in Sec. IVII 



II. GENERATION OF SATISFIABLE RANDOM 3-SAT FORMULAS 



A iiT-SAT formula contains N variables and M constraints (clauses). Each of the N variables {i,j,k, . . .) has a 
binary spin state ai £ {—1, +!}■ Each of the M constraints (a, b,c, . . .) involves K different variables {i^^, i^, . . . , i^) 
and prohibits these variables from taking a specified pattern (— J^, —Jaj ■ ■ ■ i ~J!i^): out of the total number of 2^ 
possible spin patterns of length K. An energy function can be defined for a given K-SAT formula as 

E{a„a„...,a^) = Y.l[{^^) , (1) 

a—l i^da 

where da means the set of variables involved in constraint a. For a given spin configuration a = {ai,a2, ■ ■ ■ iCtat}, 
the energy E{a) is equal to the number of unsatisfied clauses. The zero-energy configurations (if exist) of Eq. ^ 
correspond to the solutions of the K-SAT formula. A AT-SAT formula has a convenient factor graph representation 
(see the example shown in Fig.[T]): variables are denoted by circular nodes and constraints by square nodes, and there 
is an edge between a constraint node a and a variable node i if and only if variable i participates in constraint a. The 
edge (a, i) is sohd if = 1 and is dashed if = —1. 

In the present paper, we focus on random K-SAT formulas with K = 3. To generate a random 3-SAT formula, 
M different triplets {i,j,k) are randomly chosen from the total number of N{N — 1){N — 2)/6 possible triplets 
of variable nodes. A constraint a is applied on each selected triplet {i,j,k), and it prohibits the simultaneous spin 
assignment {<7i — ~Ja)^{<^j = ~Ji)^{^k = —Ja), where A means logical AND. We use two different ways to generate 
the prohibited patterns {—Ja, ~Jii ~Ja) for the M constraints, which we call the type-A and type-B formulae (see 
below). For a given satisfiable 3-SAT formula let us denote a particular solution as a* = {crj, (T27 • ■ ■ , ^^n}- Since a* 
is compatible with all the constraints of the formula, for each constraint a the following associated edge vector 

Ja ^ (JlalJia*, J^al) (z, j, k G Oa) (2) 
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can have at most two negative elements. The clauses of the 3-SAT formula can therefore be grouped into three types 
with respective to the reference solution a* , and we denote by qo, 3gi, and 3g2 respectively the fraction of constraints 
a whose edge vector have zero, one, and two negative elements. Obviously, 

qo + iqi + iq2 = l ■ 

For the first ensemble of random formulas used in this paper (type- A formulas), the prohibited spin pattern 
(—J*, — J^, ~Ja) of each clause a is independently and uniformly randomly chosen from the total number of eight 
possibilities. Such random formulas are satisfiable with a high probability as long as the constraint density a defined 

by 

a = M/N (3) 

is less than 4.267 For each constraint density a G {4.20, 4.21, 4.22, 4.23, 4.24, 4.25}, we generate a set of random 

3-SAT formulas oi N = 10^ variables; for each of these formulas, we use the survey-propagation algorithm [s^l 
(downloaded from Riccardo Zecchina's webpage) to obtain five different satisfying solutions. For a — 4.20 we also 
use the walksat algorithm (version 45, downloaded from the walksat homepage) with optimized noise parameter 
{p = 0.57 18]) to obtain another set of solutions. The solutions serve as initial conditions for the whitening and the 
belief-propagation simulations of the next two sections. For the benefit of later discussions, we refer to a solution 
obtained by the SP algorithm as an SP solution, and a solution obtained by walksat as a walksat solution. 

The second ensemble of satisfiable random formulas (type-B formulas) used in this paper are constructed in such 
a way that a pre-given spin configuration a* is a solution. Such ensembles with planted solutions have been investi- 
gated in the literature earlier [3^ , and are known to have different properties from standard random K-satisfiability, 
see e.g. [H, For each constraint a of the formula, the value of its edge vector Ja as defined by Eq. ^ is 

assigned according to the following rule [33| : a uniformly distributed random variable r S [0,1) is first gener- 
ated; if 7" < go then Ja is set to be (-1-1,-1-1,-1-1); H qo < r < qo + 3qi, then Ja is chosen uniformly randomly 
from the set {(-|-1, -|-1, — 1), (-|-1, — 1, -|-1), (— 1, -|-1, +1)}; otherwise Ja is chosen uniformly randomly from the set 
{(-f 1, — 1, — 1), (— 1, -1-1, — 1), (— 1, — 1, +1)}. For simplicity and without loss of generality, in this paper we set the 
pre-given spin configuration to be a* ~ (+1, +1, • . . , +1) when constructing type-B random 3-SAT formulas. 



III. FREEZING OF VARIABLES IN A SOLUTION CLUSTER: TWO DIFFERENT MECHANISMS 

Starting from a given solution a* of a satisfiable 3-SAT formula F, one can (in principle) build a connected network 
of solutions which contains as many solutions of formula F as possible. In this solution cluster, two solutions and 
are regarded as being directly connected if and only if they differ in the spin value of a single variable. From one 
solution of the cluster one can reach any another solution of the same cluster by a sequence of single-spin fiips (within 
the whole solution space of formula F). We refer to such a connected network of solutions as a solution cluster (or 
simply cluster) for formula F. The spin states of some variables of the formula may take the same value in all the 
solutions of the cluster. Such variables are referred to as frozen variables, they are strongly constrained in the solution 
cluster. 

There are two different mechanisms which cause freezing of variables in a solution cluster. The first mechanism, 
which we refer to as 'frozen-core formation', can be described using the following whitening process [1^. Starting 
from a given reference solution a* of a if-SAT formula, at step of the whitening process, all the clauses a which are 
simultaneously satisfied by at least two variables of solution a* are marked as white and all the variables which do not 
satisfy any clause or satisfy only white clauses are marked as white, while the remaining clauses and variables are all 
marked as nonwhite. Then at each following step t of the whitening process: (1) all the nonwhite clauses which are 
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FIG. 2: (Color Online) The fraction of nonwhite variables in a satisfying solution as a function of the parameter z as defined in 
Eq. Q. The solid line is the mean- field prediction Eq. ([5)1, while the square symbols are the results obtained by averaging over 
50 type-B random 3-SAT formulas of size A'' = 10^. (Inset) The probability for a satisfiable solution to be completely whitable. 
The data points are obtained by averaging over more than 1200 randomly generated type-B 3-SAT formulas. 

connected to at least one white variables are marked as white; and then (2) all the nonwhite variables which satisfy 
only white clauses are marked as white. The whitening process stops at step t > 1 if the number of newly whitened 
clauses (and variables) is zero. After this whitening process has finished, if a variable i is left as being nonwhite, 
one can prove that it is impossible to travel from a* to another satisfying configuration with Oi — —a* using only 
satisfying single-spin flips [36] . In other words, the spin of a nonwhite variable node i is frozen to a*. The set of 
nonwhite variables in the reference solution a* form one or several frozen-cores. For a variable i in such a frozen-core, 
there exists at least a clause a of the formula which is satisfied only by variable i in the configuration ij* and which is 
either not connected to other variables or is connected only to other variables belonging to the same frozen-core of i. 
As the constraint density a increases, the freezing of variables due to frozen-core formations is therefore a phenomenon 
of bootstrap percolation. 

The final result of the whitening process actually is independent of the order in which the variables are being 
whitened f28j. It then follows that, in a solution cluster of a iC-SAT problem, if one of the solutions can be completely 
whitened, then all the other solutions are also completely whitable. To prove this, let us suppose the contrary could 
be true, i.e., there exist two solutions of the same cluster, t?^ and ct^, with being completely whitable and ct^ 
containing a nonempty set A of nonwhite variables. One can change tr^ into by following a path of single-spin flips, 
and at at each hopping the flipped variable is whitened. During this transition process no variables of set A is flipped 
or whitened, as they are frozen variables. After is reached, as it is completely whitable, the remaining variables 
(including those in set A) can all be whitened starting from the partially whitened pattern. Therefore must also 
be completely whitable and set A should be empty. 

We denote the number of nonwhite variables in the solution a* of a random 3-SAT formula as Nnw ■ If the three 
types of clauses mentioned in Sec.|ll]are randomly distributed in the formula, a very simple equation can be obtained 
for the fraction of nonwhite variables pnw = Nnw/N. Consider a randomly chosen variable node i. This node in its 
spin state a* is satisfying some clauses, among which rii clauses are satisfied only by node i. The total number of 
clauses in the 3-SAT formula which are being satisfied by only one variable of the configuration a* is equal to zN, 
with z being expressed as 

z = 3q2a , (4) 

where q2 was defined in Sec.|TTl Therefore for a large formula with iV ^ 1, the integer Ui is distributed according to 
the Poisson distribution P(ni) = e~^z"'/ni!. Variable node i will be nonwhite if, among these rii neighboring clauses, 
there is at least one clause a whose other two connected variable nodes are both nonwhite. Then the probability of a 
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FIG. 3; Frustration effect in a completely white solution {ai = l,(Jj — l,Ok = 1,(ti = 1} for a 3-SAT of A'^ = 4 variables and 
M = 5 constraints. Variable I is frozen to the spin value ai — 1. 



randomly chosen variable node i being nonwhite is determined by the following self-consistent equation: 



For z less than a critical value Znw = 2.45541, Eq. ([5]) has only the trivial solution pnw — 0. While for z > Znw, 
another stable positive solution of Eq. ^ appears, with > 0.715332. The freezing transition at z — 2.45541 is a 
first-order bootstrap transition. 

Equation (O is confirmed to be valid for planted solutions of type-B random 3-SAT formulas (see Fig. [5]). The inset 
of Fig. [2] shows that, when the parameter z defined in Eq. ([4]) increases slightly around 2.46, the probability for the 
planted solution of a random type-B formula to be completely whitable drops quickly from w 1 to « 0, and the slope 
of this decrease become sharper for larger formulas. The values of the fraction of nonwhite variables as obtained from 
these simulations are in very good agreement with the mean- field prediction Eq. ([5]). This good agreement indicates 
that the freezing phenomenon in the planted solutions of random type-B formulas can be completely explained by the 
formation of frozen cores. 

For type-A formulas it is an empirical fact that solutions that can be found efficiently on large instances using 
algorithms known today are always white [H, [13, HI] (but see more recent simulation results of Ref. [1^). We have 
generated type-A random 3-SAT formulas with N = 10^ and constraint density a e [4.20,4.25], and used the SP 
algorithm to find solutions to these instances. For a = 4.2 we used in addition walksat with noise parameter 0.57 since 
this (and other) stochastic local search heuristics are also known to be effective at these constraint densities [l^ [13 ■ 
Interestingly, the fractions of constraints satisfied by one, two or three variables appear to depend only weakly on 
the constraint density, and is for the solutions found by SP, go ~ 0.128, qi « 0.135, and q2 ~ 0.157 for all solutions 
of instances in this range. The solutions found by walksat at a = 4.2 display also practically the same values, e.g., 
q2 « 0.155. These solutions of = 10^ all have a value of z « 2.0 considerably lower than the critical value z„^, 
and all these solutions are found to be completely white. For a type-A random graph of smaller sizes N — 10'^ — lO'' 
and constraint density a — 4.2, besides finding many completely white solutions, the walksat algorithm is also able 
to reach partially frozen solutions that contain a large fraction of frozen variables, if a non-optimal value of the noise 
parameter (e.g., p — 0.45) is used and a long search time is permitted (Lukas Kroc, private communication). These 
non-completely white solutions also have a value of z « 2.0. The mean-field formula Eq. (O, which does not consider 
any correlations in the distributions of the three types of constraints of the studied 3-SAT formula, therefore fail to 
describe these partially frozen solutions. 

Formation of frozen cores is not the only cause of variable freezing in the solution clusters of a A'-SAT formula. 
Figure [3] is a very simple example showing that, even if a solution to a AT-SAT formula is completely whitable it can 
contain frozen variables. In the solution ai = CTj = (Tfe = (T; = 1, variable I in satisfying three white clauses. The two 
neighbors (z and k) of variable I are both unfrozen variables. If I is flipped to ct; = — 1, variable i should keep cTi = 1 
while variables k should be flipped to cr^ = —1. Variable i then requires variable j to keep the value aj = 1, but 
variable k requires j to be flipped to (jj = — 1 . Such a frustration therefore prohibited variable I from taking the value 
CT/ = —1, although it is whitable in the whitening process. To speak more generally, with respect to a solution a* 
to a AT-SAT formula, let us denote by da*i the set of nearest- neighboring clauses of i which are satisfied by the spin 
value CTi = and let us assume that each of these clause a G d^'i can be satisfied in some solutions of the cluster 
associated with ct* even if ai = —a*, provided that all other clauses b G da-*\a are removed from the formula. Then 
each clause in the set d^'i for sure is whitable in the solution cluster. But variable i will still be frozen to the value 
a* if not all of these clauses can be simultaneously satisfied without the need of variable i. The second mechanism of 
variable freezing is therefore the closure of frustration loops. In a large random AT-SAT formula, because the existence 
of extremely many loops and because most of these loops are of length log(A^) or longer, this freezing mechanism is 
referred to as 'freezing by long-range frustrations'. 

Long-range frustrations were analysed in previous studies [s^, HH on ensemble of random networks. For single 
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solutions of a given random iiT-SAT formula, finding all the variables which are frozen by long-range frustrations 
however is not a trivial task. In contrast to freezing by frozen-core formation, we are not yet able to construct a 
polynomial algorithm to identify all the long-rangely frustrated and frozen variables for a solution of a given i^-SAT 
formula. We leave this challenge to future studies and here present instead two simple stochastic heuristic algorithms. 

The first heuristic search algorithm (method 1), SpinFlip, performs a (slightly biased) random walk in the solution 
cluster of the formula. Starting from a solution a* to a ii'-SAT formula F , SpinFlip records and updates the current 
satisfying configuration of the formula and three variable sets: set Vi contains all the variables that have already 
been flipped at least once (unfrozen variables), set V2 contains all the variables that have not yet been flipped and 
that are currently being flippable, and set V3 contains all the variables that belong to set Vi and that are currently 
being flippable. In each elementary trial of the program, if V2 is not empty, a variable in set V2 is randomly chosen, 
otherwise a variable in set V3 is randomly chosen; the spin of this variable is flipped and the sets Vi, V2 and V3 are 
then updated. The SpinFlip program is iterated for many steps (each of which consisting of N consecutive elementary 
trials) until no new unfrozen variables can be identified in the last n (say n = 10^) consecutive steps. As SpinFlip is 
an incomplete algorithm, it may fail to identify some unfrozen variables of a (completely whitable) solution, but we 
anticipate that most of the unfrozen variables will be discovered if the program is running for a very long time. 

Besides reporting a set of unfrozen variables, the SpinFlip program can also be used to explore the fine structure of 
a solution cluster (paper in preparation) . The drawback of this random walk algorithm is its slow rate of discovering 
new unfrozen variables. The second heuristic search algorithm (method 2) we used in this work is much faster. This 
later algorithm uses information obtained by the whitening program (i.e., "to flip variable i you probably should first 
flip variables j, k, . . ."). At each repeat, the algorithm randomly select a not yet flipped variable and propose a flip. 
This may cause some of the neighboring clauses (say a) of variable i to be violated. If such a violation happens, then 
flip a neighboring variable j of clause a, with j being selected according to the causality relationships built by the 
whitening program (there are still some freedom in choosing j). This flipping process at each branch stops after a 
variable which in its original spin value satisfies all its neighboring clauses has been fiipped. After the whole iteration 
process stops, if no clause is violated then the proposed spin flip of variable i is accepted and all the flipped variables 
during this process are added to the set of unfrozen variables. As a comparison of this program (method 2) to the 
SpinFlip program (method 1), we notice that, for the example shown in Fig. [51 SpinFlip identifies a total number of 
249, 923 unfrozen variables out of 10^ variables after running for 10^ steps on a PC (taking about seven weeks), while 
the whitening-inspired program is able to identify 265,650 unfrozen variables in a little bit less than three weeks. A 
total number of 228, 167 variables appear in both sets of unfrozen variables. If these two programs are let to run even 
longer, more unfrozen variables will be identified, but the rate of finding new unfrozen variables becomes very low. 

IV. THE ENTROPIC BELIEF-PROPAGATION ALGORITHM 

For a given solution a* of a satisfiable 3-SAT formula, the algorithms mentioned in the preceding section identify 
the set of frozen variables in the solution cluster of a*. However, these algorithms do not give information about 
the spin value preference of each unfrozen variable node, nor do they estimate the size of the solution cluster. Now 
we study in more detail the statistical property of the solution cluster associated with a* by the cavity method of 
statistical physics [27j . 

According to the current statical physics picture, the satisfying solutions of a random 3-SAT formula with constraint 
density a > 3.86 are distributed into exponentially many clusters, each of which in turn contains an exponential 
number of solutions. Different solution clusters may have different statistical properties. To characterize such a 
complex solution space structure, a cavity approach which corresponds to the mean-field first-step replica-symmetry- 
breaking spin-glass theory l27| was used in Refs. mill- In the present paper, as we are interested in single 
solution clusters of a random 3-SAT formula, a replica-symmetric cavity approach is exploited. This cavity approach 
can be expressed in terms of a set of belief-propagation (BP) iterative equations (see, e.g., Refs. [isl. [29l. [33|). 

Before we write down the BP equations let us notice however that, the concept of cluster used in this Section 
is not strictly equivalent to that defined in the preceding Section III. In the mean-field spin-glass theory, a cluster 
(also called macroscopic state) refers to a sub-space in the system's configuration space which satisfies the so-called 
clustering property [23| , namely that the spin values of two distantly separated variable nodes are not correlated. In 
a macroscopic state, the point-to-set correlation between a randomly chosen variable i and variables separated from 
I by a shortest-path distance d also decays exponentially with this distance for d large enough (see Refs. [H, [25] for 
more details). When this clustering property holds, in a given cluster C, the joint distribution P((Ti, CTj, . . .) of spins 
for a set of distantly separated variables {i, j, . . .) can be written in a factorized form: 

P{ai,aj, . . .) ^ Pi{(Ti)Pj{(Tj) . . . , (i, j, .. .being far apart) (6) 

where Pi(ai) is the marginal distribution of spin Ui in cluster C. Equation ^ may not necessarily be a good 
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approximation for a solution cluster of a satisfiable 3-SAT formula. Nevertheless, it turns out that for a large random 
3-SAT formula which has a very sparse factor graph representation, if the BP iterative algorithm converge to a fixed 
point, it always predicts the same set of frozen variables as the whitening algorithm does. In this case, the BP 
approach presumably gives an accurate and comprehensive description of the solution cluster under study. 



A. Iterative equations for the entropic belief-propagation algorithm 

In a solution cluster for a 3-SAT formula F, we define rji as the log-likelihood of variable i to be in the spin-up 
state, i.e.. 



We also define the cavity log-likelihood rji^a as 

ri, - ' — ' 



.-Mm)- 



where Pi^ai^i) is the probability for variable i to take the spin ai if it is not constrained by clause a. We denote by 
exp(uo^i) the fraction of configurations in the solution cluster in which constraint a is being satisfied by its neighboring 
variables j other than variable i. Under the assumption that, in the absence of constraint a, the neighboring variable 
nodes of a are mutually independent of each other, we can write down the following equation for Ua^i'. 



= log 



1- n p^^-i~Ji)] ' (9) 



where according to Eq. ([5]) Pj^a{—Ji) is related to 'qj^a through 

Similarly, if we use again the factorization assumption for the neighboring clauses of a variable node i, we get the 
following equation for rji_,a'- 

Vi^a = ^ Ub^i - ^ Ub^i . (11) 

bedi\a:Ji=-l bedi\a:Ji=+l 

In Eqs. © and ITTI) da\i means the set of neighboring variables except i for clause a, and so on for di\a. 

Equations © and (jlip form a set of BP iterative equations, which were used in various previous studies (see, e.g., 
Refs. [I3,[l3|)- As we are interested in the solution cluster associated with a pre-given solution a* , we use the following 
initial condition for this set of BP equations. On each directed edge from a variable node i to a constraint node a, at 
the beginning of the BP process, 

^'^"^ ^ \ -C30 if a* = -1 . 

Starting from this initial condition, the messages {rii^a,Ua-^i} along all the edges of the factor graph of the 3-SAT 
formula are updated according to Eq. © and Eq. PT|) . We have tested a synchronous and a random sequential 
BP iteration scheme. In the synchronous updating scheme, in one evolution step, first all the messages Ua^i from 
clauses to variables are updated using Eq. ([5]), then all the messages r/i^a from variables to clauses are updated using 
Eq. ()11|) . In the random sequential updating scheme, in each evolution step, first a random order (say zi,i2, . . . ,?jv) 
is made for the N variable nodes; and for each variable node i in this order, the messages rji^a (with a G di) and then 
Ub^j (with b G di, j G db\i) are updated. We have checked that in instances for which the synchronous updating 
scheme does not drive the messages {r]i^a,Ua^i} to a fixed point, the sequential updating scheme also fails to do so, 
and vise versa; while if both the synchronous and the sequential updating schemes lead to convergence of the iterative 
equations Eq. © and Eq. (fTT|) . these two schemes always reach the same fixed point. This later observation confirms 
that the BP fixed pints reached by the BP iterative equations are stable fixed-points. In the numerical simulations, 
the convergence condition for the BP iteration process is set to be that the maximal distance A (among all directed 
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edges a — > « of the graph) between the values of e""^' obtained m iteration steps t and t + 1 is less than a pre-specified 
value 5 x IQ-^: 



A — max I e" 

(a,i) 



(13) 



After the above-mentioned iteration process has reached a fixed point, the log- likelihood rji for each variable i as 
defined by Eq. ^ can be calculated by 



a{^di:J^ — — l a^di:J^—+l 

and the total entropy S of the solution cluster can be estimated by the following equation: 

i a. [i^a) 



(14) 



(15) 



In Eq. (jlSp A5'i, AS'q, and AS'^q are, respectively, the entropy increase due to the addition of variable node i, clause 
node a, and the edge (i, a) between variable i and clause a, with 



A& 



log 



exp( ^ MQ„,i) -t- exp( ^ Ua-^i) 

aGSi: — — 1 aG^z: J'; — +1 



A5a = l0g[l- n ^^-"(-'^a) 



A5„ 



log 



l-(l-e"— )^'»^a(-J:) 



(16) 
(17) 
(18) 



Following the work of Chertkov and Chernyak [i^l it can be shown that the entropy expression Eq. (|15p corresponds 
to the zeroth-order term of a loop series for the entropy of the 3-SAT formula. For the sparse factor graph of a 
large random 3-SAT formula which contains no short loops, higher order terms in this loop expansion should not 
contributed extensively to the total energy of a solution cluster, and therefore that the entropy density s = S/N as 
obtained by Eq. p5p will be exact in the thermodynamic limit oi N oo. 



B. Planted solutions as initial conditions for the BP algorithm 

A set of type-B random 3-SAT formulas of size N — 10® and different constraint densities a > 4.0 are constructed, 
each containing a planted satisfying solution a* (see Sec. |n]for details). For each of these problem instances, we run 
BP as described above and find that the it always reaches a fixed point starting from the initial condition Eq. (I12[) . 
Furthermore, the set of frozen variables (i.e., variables with rji = +oo or rji = —oo at the fixed point) as predicted 
by the BP algorithm are always identical to the set of nonwhite variables discovered by the whitening algorithm of 
Sec. mil The convergence of the BP algorithm and the agreement with the whitening algorithm suggest that the 
above-mentioned replica-symmetric mean-field cavity theory is valid and that the planted solution a* can serve as an 
appropriate initial condition for the BP algorithm. 

Figure [4] shows the BP simulation results for a set of type-B random satisfiable 3-SAT formulas which have N — 10®, 
a = 4.2 and on average equal numbers of initial satisfying and non-satisfying edges. The later restriction is satisfied 
by requiring [s^ 

91 = (1 - 4qo)/6 , 92 = (1 + 2qo)/6 , (19) 

where qo, qi, and 172 are defined in Sec. |TT1 In this sub-ensemble, the parameter qq (the fraction of constraints which 
are satisfied by three variables in configuration a*) is restricted to < go ^ 0.25. From Fig. 0] we know that as qq 
increases, the entropy density S/N of the solution cluster continuously decreases. For qq < 0.08, there is no frozen 
variables in the system (which is consistent with the prediction of Sec. IIII[) : while for qo > 0.09, a majority of the 
variables are frozen and the fraction of frozen variables is in agreement with the mean- field prediction Eq. ([5]). It is 
interesting to note that at the freezing transition point of go ~ 0.085, the entropy density of the system as a function 
of qo does not show any sign of singularity, while the fraction of frozen variables has a large jump. According to the 
mean-field cavity theory the en trop y densities of solution clusters in a completely random 3-SAT formula of a = 4.2 
range from w 0.060 to « 0.088 fl4!|, which are within the range of values shown in Fig. 21 

We have applied the whitening-inspired search program (method 2, see Sec. IIII[) to the planted solutions of several 
type-B 3-SAT formulas. In each of these tests, this search program confirms that all the white variables of the solution 
are nonfrozcn variables. In the cluster of a planted solution it appears not to be any long-range frustrations. This is 
consistent with the fact that BP always converges with planted solutions. 
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% 

FIG. 4: (Color Online) Average entropy density (diamond symbols, dashed line being a guide to the eye) and fraction of 
frozen variables (square symbols) for 50 randomly generated type-B 3-SAT formulas oi N = 10^ and a — 4.2. The parameters 
{qo,qi,q2) satisfy Eq. (|19p . The solid line is the fraction of nonwhite variables as predicted by Eq. ([SJ. 

V. ENTROPY OF THE CLUSTERS REACHED BY SURVEY-PROPAGATION AND WALKSAT 

We also generate a set of type-A random 3-SAT formulas of size N — 10^ and a G [4.20, 4.25] and for each of them, 
use the survey-propagation algorithm to find a set of satisfying solutions. For a = 4.20 we use in addition walksat 
as described in Sec. [TTl For a = 3.90 and 3.925 with N = 10^, solutions are also obtained by the belief-propagation- 
inspired decimation algorithm [l^. [20l|. The BP algorithms is then applied on these instances, using these solutions 
a* as initial conditions. We find that for each problem instance, both the synchronous and the sequential BP schemes 
predict that there is no frozen variables in the system, consistent with the result of the whitening algorithm. However, 
in contrast to the the preceding subsection, none of these BP simulations converges to a fixed point of messages 
{r]i^a,Ua—fi}- The messages {r]i—,a,Ua—>i} along many edges keeping fluctuating considerably around certain mean 
values. Different variables have different amplitude of rj and u fluctuations. In the random sequential updating 
scheme, these fluctuations do not show periodic patterns. 

If the non- fixed-point messages {rji^a-, Ua^i} are used to calculate the entropy, Eq. (|15p reports an entropy density 
value of s « 0.090 at a = 4.2, which is equal to the replica-symmetric entropy density as calculated in earlier studies 
(see Fig. 2(b) of Ref. or Ref. [3]). We have further checked that, if the BP iteration starts from the RS initial 
condition {r]i^a = 0}, the evolution trajectory of the messages {rij^b,Ub^j) on any given edge {j,b) can not be 
distinguished from the evolution trajectories starting from a SP- or walksat-solution. The BP algorithm therefore in 
some sense forgets its starting point and does not bring us any cluster-specific information. 

The non-convergence of BP is not due to the different 3-SAT ensemble used in this section but is related with the 
initial conditions used in the BP. To support this claim, we notice that for the type-B random formulas with qq > 0.01 
studied in Fig. d] of the preceding section, when SP- or walksat-solutions instead of planted solutions are used as 
initial conditions for the BP algorithm, the BP algorithm fails to converge. At a = 4.2 and qo > 0.01, solutions found 
by SP or walksat are not in the same cluster as the planted solution. The planted solution correspond to a 'crystal' 
phase, while the SP- and walksat-solutions are all belong to the 'glassy' phase. The set of typc-B random formulas 
with qo < 0.01 used in Fig. |4] are in the replica-symmetric phase (only one solution cluster), and then BP converges 
to the same fixed-point independent of initial conditions. 

As we mentioned in Sec. IIIIl variable freezing can be caused by long-range frustrations [sol [sij . As an example, 
the SpinFlip algorithm was only able to confirm that 25 percent of the variables are unfrozen in a completely white 
SP-solution for a type-A 3-SAT formula with N — 10^ variables and constraint density a = 4.25, even after running 
for seven weeks. (The whitening-inspired search algorithm performs a little bit better, it reported that 26.5 percent of 
the variables in the same solution are unfrozen, after running for about three weeks.) As BP uses only local structural 
information of a graph, it is unable to detect the globally constrained variables. Some of the variables in the solution 
can be externally fixed to make BP converge. For this exemplar solution, in Fig. [5lA we plot the number of newly 
whitened variables at each step of the whitening process, and show how many of them are also identified as unfrozen 
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FIG. 5: (Color Online) (a) The = 10'^ variables of a random type-A 3-SAT formula with M = 4.25 x lO'' constraints is 
grouped into 51 sets according to which step t they are whitened by the whitening algorithm. The SpinFlip algorithm (method 
1) identified 249,923 variables to be unfrozen after making 10^ x A'^ spin flips (in about seven weeks), while the whitening- 
inspired algorithm (method 2) identified 265, 650 variables to be unfrozen after running for about three weeks. The distribution 
of these unfrozen variables in each group is shown. The reference solution is obtained by the SP algorithm, (b) The entropy 
density and fraction of frozen variables as predicted by the BP algorithm when the spin values of all the variables whitened at 
step t of the whitening process are being externally fixed. The BP becomes non-convergent at f > 33 (marked by the dashed 
lines) . 



variables by the two heuristic programs. At earlier steps {t < 25) of the whitening process, most of the whitened 
variables are confirmed to be unfrozen variables; but at later whitening steps most of the newly whitened variables 
are very difficult to be flipped (if not impossible). This is expected: as long-range frustration effects are strongly 
related to the closure of loops in the graph, only variables which are whitened at steps of order log(iV) (the typical 
length of a loop) or later can have a large probability of being frozen or extremely constrained. Since variables that 
are whitened in the later steps of the whitening process are extremely difficult to be flipped, they must be extremely 
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Constraint Density 

FIG. 6: The entropy density of a SP solution-associated solution cluster for a random 3-SAT formula of size iV = 10^ The 
fraction of frozen variables in the solution cluster as predicted by BP is also shown. Each data point is obtained by running 
BP on a single SP-solution by externally fixing a tiny fraction of variables. 



constrained in the reference solution a* . Figure [HJ'V then suggests that a way to make BP converge is to externally fix 
the spin values of variables which are newly whitened in certain whitening step (say t). If these variables are fixed, 
the whitening program can not proceed to the next step t + 1, and BP, which is capably of detecting frozen-cores, 
will then predicts all the variables correspond to steps > t -I- 1 in Fig. [SjA to be frozen. 

Figure [5j3 shows the results of the BP after externally fixing a group of variables. When the spin values of the 
variables corresponding to whitening step t are all externally fixed, the BP iteration converges to a fixed point as 
long as i < 32. The calculated entropy density value increases with t only very slowly and the predicted number of 
frozen variables decreases with t. BP fails to converge when the externally fixed variables are from whitening step 
t > 33 (the parameter A as defined in Eq. (|13p is about 0.5 even after 2 x 10'* iteration steps). Therefore we take 
the value of s = 0.0641 obtained at < = 32 as the entropy density of the solution cluster. The fraction of frozen 
variables is predicted to be 0.7472 ai t = 32, while from the result of the two heuristic search programs (see Fig. [SJ^) 
we know that the real fraction of frozen variables should be at most 0.713. It appears that more variables should 
be in the frozen state for BP to converge: if we run the BP iteration by externally fixing all the frozen variables 
reported by either of the two search programs, BP again fails to converge. It is tempting for us to interpret this 
observation as follows: The solution cluster can be divided into many sub-clusters; in each sub-cluster, besides those 
variables that are frozen in all the sub-clusters, some additional variables are frozen. For a very large random /iT-SAT 
formula, as there are exponentially many different combinatorial ways to choose these additional sub-cluster specific 
frozen variables, probably the number of sub-clusters is also exponential. These sub-clusters are not separated with 
each other by any energy barriers, they are formed by correlation properties. This conjectured further structural 
organization in a single solution cluster will be checked by a future investigation. 

We have applied this combined BP and whitening approach to a set of other type-A random 3-SAT formulas and 
SP-solutions and walksat-solutions. Some of the results are reported in Fig. [6] At a = 4.20, the entropy densities 
of the solution clusters reached by SP and walksat are comparable, being s « 0.0725 and s « 0.0715, respectively. 
In comparison, the dominant clusters of the random 3-SAT problem has entropy density s « 0.088 and the most 
abundant clusters have entropy density s « 0.060 [14]. At q = 4.25, the clusters reached by SP have entropy density 
s ~ 0.064, while the dominating and most abundant clusters have entropy density values s k, 0.068 and s w 0.060, 
respectively [l3| . It appears that solutions reached by SP and walksat belong neither to one of the dominating clusters 
or to one of the most abundant clusters (the same observation was obtained in Ref. ^47| for the random bi-coloring 
problem). 

We close this Section by mentioning that another way of making BP converge to a fixed point is by damping [4^ . 
This strategy was not used in the present work because we were not yet very clear of the physical meaning of the 



converged fixed-point of the damped BP algorithm [44] . As explained in this Section, the action of externally fixing 



some variables to make BP converge is physically reasonable, as most of the predicted frozen variables by the resulting 
BP fixed-point are highly constrained variables and very difficult to be flipped (see Fig. [Sl'V). Explicitly fixing these 
variables helps to remove possible long-range correlations within the solution cluster of the 3-SAT formula. Figure [SJ3 
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demonstrates that the calculated entropy value is not sensitive to the set of variables that are being externally fixed. 
Another difi'erent strategy was used in Ref. [4l[ to calculate the entropy of a single solution cluster. 

VI. CONCLUSION 

This paper have studied the statistical property of solution clusters that are associated with single solutions of 
a random 3-SAT formula. It was pointed out that there are two different mechanisms for the freezing of variables 
in a solution cluster. Variables which arc frozen due to frozen-core formation can easily be identified by both the 
whitening and the belief-propagation algorithm. But variables which are frozen due to long-range frustrations can be 
very difficult to be identified, as long-range frustrations involve global and topological property of the 3-SAT formula. 
A heuristic search algorithm, SpinFlip, was constructed to search for such variables. 

When long-range frustrations exist in a solution a* and the associated cluster of random 3-SAT formula, the BP 
iteration process starting from the initial condition a* is unable to reach a fixed-point. To overcome this difficult, a 
tiny set of variables (chosen with the help of the whitening algorithm) was externally fixed to their initial spin values 
during the BP iteration. When this modified BP process reaches a fixed point, the entropy densities of the solution 
cluster was evaluated. It was found that at 4.2 < a < 4.25, the solutions obtained by SP or walksat for a given 
random 3-SAT problem are in medium-sized clusters, their entropy densities are higher than the entropy density of 
the most abundant clusters in the formula but lower than the entropy density of the most dominating clusters. 

The present work indicates that at constraint density a close to the satisfiability threshold, a single solution cluster 
of a random 3-SAT formula can be further divided into sub-clusters. Such further structural organizations, if exist, 
may be described using the first-step RSB spin-glass cavity theory [23| . Further work along this line will be reported 
in another paper. 
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